Entities, as important carriers of real-world knowledge, play a key role in many NLP tasks. We focus on incorporating entity knowledge into an encoder-decoder framework for informative text generation. Existing approaches tried to index, retrieve, and read external documents as evidence, but they suffered from a large computational overhead. In this work, we propose an encoder-decoder framework with an entity memory, namely EDMem. The entity knowledge is stored in the memory as latent representations, and the memory is pre-trained on Wikipedia along with encoder-decoder parameters. To precisely generate entity names, we design three decoding methods to constrain entity generation by linking entities in the memory. EDMem is a unified framework that can be used on various entity-intensive question answering and generation tasks. Extensive experimental results show that EDMem outperforms both memory-based auto-encoder models and non-memory encoder-decoder models.
translated by 谷歌翻译
知识密集型任务,例如开放域问题答案(QA),需要访问大量的世界知识或领域知识。知识密集型任务的一种常见方法是采用检索到阅读的管道,该管道首先从诸如Wikipedia之类的外部语料库中检索少数相关的上下文文档,然后预测在检索文档的条件下得到答案。在本文中,我们提出了一种新的观点,可以通过用大型语言模型生成器代替文档检索器来解决知识密集型任务。我们称我们的方法生成-Read Read(GenRead),该方法首先提示大型语言模型根据给定问题生成上下文文档,然后读取生成的文档以产生最终答案。此外,我们提出了一种基于聚类的提示方法,该方法选择了不同的提示,从而产生了涵盖不同观点的生成文档,从而更好地回忆了可接受的答案。我们对三个不同的知识密集任务进行了广泛的实验,包括开放域质量检查,事实检查和对话系统。值得注意的是,GenRead在Triviaqa和WebQ上实现了71.6和54.4的精确匹配分数,显着超过了最先进的检索到+4.0和+3.9的最先进的dpr-fid,而无需从任何外部知识源中检索任何文档。最后,我们证明可以通过结合检索和生成来进一步提高模型性能。
translated by 谷歌翻译
基于匹配的方法,尤其是基于时空记忆的方法,在半监督视频对象分割(VOS)中明显领先于其他解决方案。但是,不断增长和冗余的模板特征导致推断效率低下。为了减轻这一点,我们提出了一个新型的顺序加权期望最大化(SWEM)网络,以大大降低记忆特征的冗余。与以前仅检测帧之间特征冗余的方法不同,Swem通过利用顺序加权EM算法来合并框架内和框架间的相似特征。此外,框架特征的自适应权重具有代表硬样品的灵活性,从而改善了模板的歧视。此外,该提出的方法在内存中保留了固定数量的模板特征,从而确保了VOS系统的稳定推理复杂性。对常用的戴维斯和YouTube-VOS数据集进行了广泛的实验,验证了SWEM的高效率(36 fps)和高性能(84.3 \%$ \ Mathcal {J} \&\ Mathcal {F} $代码可在以下网址获得:https://github.com/lmm077/swem。
translated by 谷歌翻译
现有视觉语言预训练(VLP)方法主要依赖于配对的图像文本数据集,这些数据集由大量人类劳动注释,或者从互联网上爬行,然后是精心制作的数据清洁技术。为了减少对良好的图像文本对的依赖,有望直接利用仅大规模的仅文本和仅图像的语料库。本文提出了一种数据增强方法,即跨模式cutmix(CMC),用于在未配对的VLP中进行隐式跨模式对齐学习。具体而言,CMC将自然句子从文本视图转换为多模式视图,在该视图中,句子中的视觉词语单词被带有相似语义的各种图像贴片随机替换。拟议中的CMC有几个吸引人的礼节。首先,它增强了数据多样性,同时保持语义含义完好无损地解决了对齐数据稀缺的问题;其次,通过将跨模式噪声连接到单模式数据上,它指导模型以学习跨模态的令牌级相互作用,以更好地降级。此外,我们提出了一种名为VLMIXER的新的未配对VLP方法,该方法将CMC与对比度学习集成在一起,以将Uni-Mododal和多模式视图汇总在一起,以在不同模式之间进行更好的实例级别对齐。在五个下游任务上进行的广泛实验表明,VLMIXER可以超过以前最先进的未配对VLP方法。
translated by 谷歌翻译
离线增强学习(RL)的样本效率保证通常依赖于对功能类别(例如Bellman-Completeness)和数据覆盖范围(例如,全政策浓缩性)的强有力的假设。尽管最近在放松这些假设方面做出了努力,但现有作品只能放松这两个因素之一,从而使另一个因素的强烈假设完好无损。作为一个重要的开放问题,我们是否可以实现对这两个因素的假设较弱的样本效率离线RL?在本文中,我们以积极的态度回答了这个问题。我们基于MDP的原始偶对偶进行分析了一种简单的算法,其中双重变量(打折占用)是使用密度比函数对离线数据进行建模的。通过适当的正则化,我们表明该算法仅在可变性和单极浓缩性下具有多项式样品的复杂性。我们还基于不同的假设提供了替代分析,以阐明离线RL原始二算法的性质。
translated by 谷歌翻译
最近,类似于MLP的视觉模型已在主流视觉识别任务上实现了有希望的表演。与视觉变压器和CNN相反,类似MLP的模型的成功表明,令牌和渠道之间的简单信息融合操作可以为深度识别模型带来良好的表示能力。但是,现有的类似于MLP的模型通过静态融合操作融合代币,缺乏对代币内容的适应性。因此,习惯信息融合程序不够有效。为此,本文介绍了一种有效的MLP式网络体系结构,称为Dynamixer,诉诸动态信息融合。至关重要的是,我们提出了一个程序,该过程依赖于该过程,以通过利用混合所有令牌的内容来动态生成混合矩阵。为了减少时间复杂性并提高鲁棒性,采用了降低性降低技术和多段融合机制。我们提出的Dynamixer模型(9700万参数)在没有额外的训练数据的情况下,在Imagenet-1k数据集上实现了84.3 \%TOP-1的精度,对最先进的视觉MLP模型表现出色。当参数数量减少到26m时,它仍然可以达到82.7 \%TOP-1的精度,超过了具有相似容量的现有MLP样模型。该代码可在\ url {https://github.com/ziyuwwang/dynamixer}中获得。
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
We aim to bridge the gap between our common-sense few-sample human learning and large-data machine learning. We derive a theory of human-like few-shot learning from von-Neuman-Landauer's principle. modelling human learning is difficult as how people learn varies from one to another. Under commonly accepted definitions, we prove that all human or animal few-shot learning, and major models including Free Energy Principle and Bayesian Program Learning that model such learning, approximate our theory, under Church-Turing thesis. We find that deep generative model like variational autoencoder (VAE) can be used to approximate our theory and perform significantly better than baseline models including deep neural networks, for image recognition, low resource language processing, and character recognition.
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
We consider infinite horizon Markov decision processes (MDPs) with fast-slow structure, meaning that certain parts of the state space move "fast" (and in a sense, are more influential) while other parts transition more "slowly." Such structure is common in real-world problems where sequential decisions need to be made at high frequencies, yet information that varies at a slower timescale also influences the optimal policy. Examples include: (1) service allocation for a multi-class queue with (slowly varying) stochastic costs, (2) a restless multi-armed bandit with an environmental state, and (3) energy demand response, where both day-ahead and real-time prices play a role in the firm's revenue. Models that fully capture these problems often result in MDPs with large state spaces and large effective time horizons (due to frequent decisions), rendering them computationally intractable. We propose an approximate dynamic programming algorithmic framework based on the idea of "freezing" the slow states, solving a set of simpler finite-horizon MDPs (the lower-level MDPs), and applying value iteration (VI) to an auxiliary MDP that transitions on a slower timescale (the upper-level MDP). We also extend the technique to a function approximation setting, where a feature-based linear architecture is used. On the theoretical side, we analyze the regret incurred by each variant of our frozen-state approach. Finally, we give empirical evidence that the frozen-state approach generates effective policies using just a fraction of the computational cost, while illustrating that simply omitting slow states from the decision modeling is often not a viable heuristic.
translated by 谷歌翻译